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("^ Abstract. We develop a new notion of security against timing attacks where the attacker 

^S) ■ is able to simultaneously observe the execution time of a program and the probability of 

the values of low variables. We then show how to measure the security of a program with 
respect to this notion via a computable estimate of the timing leakage and use this estimate 
for cost optimisation. 



1 Introduction 



Early work on language-based security, such as Volpano and Smith's type systems [1], precluded 
the use of high security variables to affect control flow. Specifically, the conditions in if-commands 
and while-commands were restricted to using only low security information. If this restriction is 
weakened, it opens up the possibility that high security data may be leaked through the different 
timing behaviour of alternative control paths. This kind of leakage of information is said to form 

O^ ' a covert timing channel and is a serious threat to the security of programs (cf. e.g. [2]). 

t^^ ■ We develop a new notion of security against timing attacks where the attacker is able to 

simultaneously observe the execution time of a (probabilistic) program and the probability of 
the values of low variables. This notion is a non-trivial extension of similar ideas for deterministic 

C^^ ' programs [3] which also covers attacks based on the combined observation of time and low variables. 

TTt: ■ This earlier work presents an approach which, having identified a covert timing channel, provides 

^— N I a program transformation which neutralises the channel. 

We start by introducing a semantic model of timed probabilistic transition systems. Our ap- 
proach is based on modelling programs essentially as Markov Chains (MC) where the stochastic 

J^ ' behaviour is determined by a joint distribution on both the values assigned to the program's 

JH \ variables and the time it takes the program to perform a given command. This is very different 

from other approaches in the area of automata theory which are also dealing with both time 
and probability. In this area the timed automata constitute a well-established model [4]. These 
automata have been extended with probability and used in model-checking for the verification 
of probabilistic timed temporal logic properties of real-time systems [5]. The resulting model is 
essentially a Markov Decision Process where rewards are interpreted as time durations and is 
therefore quite different from our MC approach. In particular, the presence of non-determinism 
makes it not very appropriate as a base of our quantitative analysis aiming at measuring timing 
leaks. We next present a concrete programming language with a timed probabilistic transition sys- 
tem as its execution model. This language is based on the language studied in [3] but is extended 
with a probabilistic choice construct - whilst this may not play a role in user programs, it has an 
essential role in our program transformation. In order to determine and quantify the security of 
systems and the effectiveness of potential counter-measures against timing attacks we then discuss 
an approximate notion of timed bisimilarity and construct an algorithm for computing a quanti- 
tative estimate of the vulnerability of a system against timing attacks; this is given in terms of 
the mismatch between the actual transition probabilities and those of an ideal perfectly confined 
program. Finally, we present a probabilistic variation of Agat's padding algorithm which we use 
to illustrate - via an example - a technique for formally analysing the trade-off between security 
costs and protection. 



2 The Model 

We introduce a general model for the semantics of programs where time and probability are explic- 
itly introduced in order to keep track of both the probabilistic evolution of the program/system 
state and its running time. 

The scenario we have in mind is that of a multilevel security system and an attacker who can 
observe the system looking at the values of its public variables and the time it takes to perform a 
given operation or before terminating, or other similar properties related to its timing behaviour. 

In order to keep the model simple, we assume that the time to execute a statement is constant 
and that there is no distinction between any 'local' and 'global' clocks. In a more realistic model, 
one has - of course - to take into account also that the execution speed might differ depending on 
which other process is running on the same system and/or delays due to uncontrollable events in 
the communication infrastructure, i.e. network. 

Our reference model is the timed probabilistic transition system we define below. The intu- 
itive idea is that of a probabilistic transition system (similar to those defined in all generality 
in [6]) where transition probabilities are defined by a joint distribution of two random variables 
representing the variable updates and time, respectively. 

Let us consider a finite set X, and let Dist(X) denote the set of all probability distributions 
on X, that is the set of all functions tt : X ^ [0, 1], such that Ylx^x '^i^) — ^- We often represent 
these functions as sets of tuples {{x,tt{x))}x&x- If the set X is presented as a Cartesian product, 
i.e. X = Xi X X2, then we refer to a distribution on X also as a joint distribution on Xi and X2. A 
joint distribution associates to each pair (xi, X2), with xi G Xi, X2 S X2 the probability 7r(xi, X2)- 
It is important to point out that, in general, it is not possible to define any joint distribution on 
Xi X X2 as a 'product' of distributions on Xi and X2, i.e. for a given joint distribution n on 
X = XiX X2 it is, in general, not possible to find distributions tti and tt2 on Xi and X2 such that 
for all (xi,a;2) G Xi x X2 we have tt(xi,X2) = 7ri(xi)7r2(x2). In the special cases where a joint 
distribution tt can be expressed in this way, as a 'product', we say that the distributions tti and 
TT2 are independent (cf. e.g. [7]). 

2.1 Timed Probabilistic Transition Systems 

The execution model of programs which we will use in the following is that of a labelled transition 
system; more precisely, we will consider probabilistic transition systems (PTS) . We will put labels 
on transitions as well as states; the former will have "times" associated with them while the 
latter will be labelled by uninterpreted entities which are intended to represent the values of (low 
security) variables, i.e. the computational state during the execution of a program. Wc will not 
specify what kind of "time labels" we use - e.g. whether we have a discrete or continuous time 
model ~ we just assume that time labels are taken from a finite set T C R+ of positive real 
numbers. The "state labels" will be taken from an abstract set which we denote by L. 

Definition 1. We define a timed Probabilistic Transition System with labelled states, or tPTS, 

as a triple (S*, >, A), with S a finite set 0/ states, > C 5 x T x [0, 1] x S* a probabilistic transition 

relation, and A : S* ^ L a state labelling function. 

p:t 

We denote by Si ^ S2 the fact that {si,p,t, S2) G — > with si,S2 G S, p (z [0, 1] and t e T. 

In a general tPTS we can have non- determinism in the sense that for two states si S2 we may 

have si ^ S2 and si s- S2, which would suggest that it is possible to make a transition 

from si to S2 in different times (ti and t2) and probability 1, i.e. certainly. In order to eliminate 
non-determinism we will consider in this paper only tPTS's which arc subject to the following 
conditions: 

1. for all s e 5 we have Y'/^ „ , „ ^c , Pi = 1, and 

2. for alH G T there is at most one tuple (5i,t,p, 52) € — >• 



The first condition means that we consider here a purely probabilistic or generative execution 
model. The second condition allows us to associate a unique probability to every transition time 
between two states, i.e. triple (si, t, S2); this means that we can define a function tt : S* x T x S* ^ 

p:t 



[0, 1] such that si 



S2 iff 7r(si,t,P2) 



P- 



Note however, that it is still possible to have 
differently timed transitions between states, i.e. it is possible to have {si,ti,p2, S2) G — > and 

{si,t2,P2,S2) e > with ti y^ t2. 

If for all si,S2 G S there exists at most one {si,t,p, S2) G — >, we can also represent a timed 
Probabilistic Transition System with labelled states as a quadruple {S, — >, r, A) with t : S x S ^ 
[0, 1] X T, a timing function. Thus, to any two states si and S2 we associate a unique transition 
time isi.s2 ^nd probability ps^s^. 

Definition 2. Consider a tPTS (S, — >, A) and an initial state sq G S . An execution sequence or 



trace starting in so is a sequence (sq, si, . . .) such that Si 



P,.U 



^i+i, 



for alii ^0,1,2, 



We associate, in the obvious way, to an execution sequence a ~ (so, si, . . .) three more se- 
quences: (i) the transition probability sequence: {pi,p2, ■ ■ .), (ii) a time stamp sequence: (ii, ^2, • • ■), 
and (iii) a state label sequence: (A(so), A(si), . . .). 

Even for a tPTS with a finite number of states it is possible to have infinite execution se- 
quences. It is thus, in general, necessary to consider measure theoretic notions in order to define 
a mathematically sound model for the possible behaviours of a tPTS. However, as long as we 
consider only terminating systems, i.e. finite traces, things are somewhat simpler. In particular, 
in this case, probability distributions can replace measures as they are equivalent. In this paper 
we restrict our attention to terminating traces and probability distributions. This allows us to 
define for every finite execution sequence a = {sq, Si, . . .) its running time as T{a) = ^ i^, and its 
execution probability as 7r(cr) — Yi^i- We will also associate to every state Sg its execution tree, i.e. 
the collection of all execution sequences starting in sq- 

2.2 Observing tPTS's 

In Section 3 we will present an operational semantics of a simple imperative programming language, 
pWhile, via a tPTS. Based on this model we will then investigate the vulnerability against attackers 
who are able to observe (i) the time, and (ii) the state labels, i.e. the low variables. In this setting 
we will argue that the combined observation of time and low variables is more powerful than the 
observation of time and low variables separately. The following example aims to illustrate this 
aspect which comes from the properties of joint probability distributions. 

Example 1. In order to illustrate the role of joint distributions in the observation of timed PTS's 
let us consider the following simple systems. 





We assume that the attacker can observe the execution times and that he/she is also able to 
(partially) distinguish (the final) states. In our example we assume that the states depicted as • 
and o form two classes which the attacker can identify (e.g. because • and o states have the same 
values for low, variables). The question now is whether this information allows the attacker to 
distinguish the two tPTS's. 

If we consider the information obtained by observing the running time, we see that both 
systems exhibit the same time behaviour corresponding to the distribution {(l,i),{(2,i)} over 
T = {1,2}. The same is true in the case where the information is obtained by inspecting the final 
states: we have the distributions {(•, i), {(o, i)} over L = {•, 0} for both systems. 



However, considering that the attacker can observe running time and labels simultaneously, we 
see that the system on the right hand side always runs for 2 time steps iff it ends up in a • state 
and 1 time step iff it ends up in a o state. In the system on the left hand side there is no such 
correlation between running time and final state. The difference between the two systems, which 
allows an attacker to distinguish them, is reflected in the joint distributions over T x L. These can 
be expressed in matrix form for the two systems above as: 
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X2{t,l) 
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Note that while Xi is the product of two independent probability distributions on T and L it is 
not possible to represent X2 in the same way. 

3 An Imperative Language 

We consider a language similar to that used in [3] with the addition of a probabilistic choice 
construct. The syntax of the language is as follows: 



Operators: op 



Expressions: e : 

Commands: C, D : 



- = M = 



< <= 



' V \ X \ e op e 

' X := e\ skipAsn x e | if (e) then C else D \ skiplf e C 
I while (e) do C\C;D\ choose^ C or D 
Basic Values: v : := n\ true I false 



The probabilistic choice is used in an essential way in the program transformation presented later. 
We also keep the language of types in [3] , although in a simplified form: 



Security levels 
Base types 
Security types 



and sub-typing relation: 



L I H (with L < H and s < s) 

Int I Bool 



Sl < S2 



^si < Ts 



We will indicate by E the state of a computation and denote by E^ its restriction to low 
variables, i.e. a state which is defined as E for all the low variables for which E is defined, and 
is undefined otherwise. We say that two configurations (E \ C) and {E' \ C") are low equivalent 
if and only if El = E'j^ and we indicate this by {E \ C) —l {E' \ C"). In the following we will 
sometimes use for configurations the shorthand notation c, ci, C2, . . . , c', c'j^, . . .. We will also denote 
by Conf the set of all configurations. 

3.1 SOS Semantics 

The operational semantics of pWhile - except for the probabilistic choice construct - follows 
essentially the one presented in [3]. For the convenience of the reader we present here all the 
rules which are based on the big step semantics for expressions (where |op] represents the usual 
semantics of operators) : 

E\- vilv -^(^) = V E\- eiij^vi E\- 62 -11 V2 

'^ E \- X ij^v E\- ei op e2 ^ wi |op|w2 

The small step semantics is then define as a timed PTS via the SOS rules in Table 1. 
The time labels i. represent the time it takes to perform certain operations: t^ is the time 
to store a variable, tp is the time it takes to evaluate an expression, tasn represents the time 



(Assign) 



(Seq) 



(If) 



(SkipAsn) 

(Skiplf) 

(While) 

(Choose) 



E\- ei}.v 

{E\x:= e) "^ > E\x = v\ 

(E I C) — — -U- E' 



{E I C; D) ^^ {E' I D) 



{E\C) 



(E' I C) 



p:ts 



{E I C;D} -^-^ {E' I C";D) 
E \- e ij. true 



{E I if (e) then C else D) ' °' "'> {E \ C) 
E'r eHy false 



{E I if (e) then C else D) ' ' > 

E\- ei}.v 

[E I skipAsn x e) *- E 



{E I skiplf e C) 



^ (-E I C) 



{£|D) 



u £ {true, false} 



Eh e^ false 



{£ I while (e) do C) ^''"''""'^ > E 



E \- e ^ true 



{E I while (e) do C) 



^ (£ I C; while (e) do C) 



{E I choose^ C or D) 



p-tdh 



{E\C) 



{E I choose^ C or D) ^' '''^''^'' > (S | D) 



Table 1. Operational Semantics 



to perform an assignment, tbr is the time required for a branching step, and tch is the time to 
perform a probabihstic choice. By ts we denote any sequence of time labels and with y/ we indicate 
termination. 

The rule (Choose) is the only new rule with respect to the original semantics in [3]. It states 
that the execution of a probabilistic choice construct leads, after a time tch, to a state where either 
the command C or the command D is executed with probability p or I — p, respectively. This rule 
together with the standard transition rules for the other constructs of the language define a tPTS 
for our pWhile language according to Definition 1. In this tPTS, the state labels are given by the 
environment, i.e. X{{E \ C)) = E. 



3.2 Abstract Semantics 

According to the notion of security we consider in this paper, an observer or attacker can only 
observe the changes in low variables. Therefore, we can simplify the semantics by 'collapsing' the 
execution tree in such a way that execution steps during which the value of all low variables is 
unchanged are combined into one single step. Wc call an execution sequence cr deterministic if 
7r(cr) — 1, and we call it low stable if \{si)\L ~ I for all Si € a. The empty path (of length zero) 
is by definition deterministic and low stable. An execution sequence is maximal deterministic/low 
stable if it is not a proper sub-sequence of another deterministic/low stable path. 



Definition 3. We define the collapsed transition relation by: {Ei \ Ci) )• {E2 \ C2) iff 

p:t 

(i) there exists a configuration {E[ \ C() such that {Ei \ Ci) >■ {E[ \ C(), 

(ii) the path {E[ \ C[) — ^^ . . . — ^ {E'2 \ C'^j — -^ (-B2 | ^2) is deterministic, 

(iii) the path {E\ \ C\) ^ {E'^ \ C() ^ . . . ^ {E^ \ C^ is maximal low stable, 



(iv) andT = t + ^ti. 



i=l 



This is illustrated in the following example. In the depicted execution trees we indicate in the 
nodes only the state and omit the program parts of the corresponding configurations. Moreover, 
we use the notation [n, m] for the state E where h has value n and I has value m. 
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The collapsed execution tree on the right hand side represents in effect what an attacker can 
actually observe during the program execution (for our analysis of the situation we still record the 
value of h although it is invisible to the attacker). 

4 Bisimulation and Timing Leaks 

Observing the low variables and the running time separately is not the same as observing them 
together; a correlation between the two random variables (probability and time) has to be taken 
into account (cf. Section 2). A naive probabilistic extension of the P-bisimulation notion introduced 
in [3] might not take this into account. More precisely, this may happen if time and probability 
are treated as two independent aspects which are observed separately in a mutual exclusive way. 
According to such a notion an attacker must set up two different covert channels if she wants 
to exploit possible interference through both the probabilistic and the timing behaviour of the 
system. 

The notion of bisimulation we introduce here allows us to define a stronger security condition: 
an attacker must be able to distinguish the probabilities that two programs compute a given result 
in a given execution time. This is obviously different from being able to distinguish the probability 
distributions of the results and the running time. 

4.1 Probabilistic Time Bisimulation 

Probabilistic bisimulation was first introduced in [8] and refers to an equivalence on probability 
distributions over the states of the processes. This latter equivalence is defined as a lifting of the 
bisimulation relation on the support sets of the distributions, namely the states themselves. 

An equivalence relation ~ C 5* x 5* on 5 can be lifted to a relation ^* C Dist(S') x Dist(S') 
between probability distributions on S via (cf [6, Thm 1]): 

Hr^* uiS V[s] e S/r. : fi{[s]) = i^{[s]). 

It follows that ^* is also an equivalence relation ([6, Thm 3]). 



For any equivalence relation ^ on the set Conf of configurations, we define the associated low 
equivalence relation ~i by ci ^l C2 if ci ^ C2 and ci =l C2. Obviously ^^ is again an equivalence 
relation. We can lift a low equivalence ^^ to (^l)* which wc simply denote by ^*^. 

Definition 4. Given a security typing F , a probabilistic time bisimilarity ^ is the largest sym- 
metric relation on configurations such that whenever Ci ^ C2, then 

Ci => Xi implies that there exists Xi such that C2 => X2 o,nd Xi ^l X2- 

We say that two configurations are probabilistic time bisimilar or PT-bisimilar, ci ^ C2, if 
there exists a probabilistic time bisimilarity relation in which they are related. 

This definition generalises the one in [3] which only applies to deterministic transition systems. 
Note that there is a difference between ~2= (~l)* and (~*)l; in fact, only the former is able 
to take into account the correlation between time and low variables, while the latter would be 
a straightforward generalisation of the time bisimulation in [3] which is unable to model such a 
correlation. 

We now exploit the notion of bisimilarity introduced above in order to introduce a security 
property ensuring that a system is confined against any combined attacks based on both timing 
and probabilistic covert channels. 

Definition 5. A pWhile program P is probabilistic time secure or PT-secure if for any set of 

initial states E and E' such that El = E'j^, we have {E, P) ~ {E\ P) . 

5 Computing Approximate Bisimulation 

The papers [9, 10] introduce an approximate version of bisimulation and confinement where the 
approximation can be used as a measure e for the information leakage of the system under analysis. 
The quantity e is formally defined in terms of the norm of a linear operator representing the 
partition induced by the 'minimal' bisimulation on the set of the states of a given system, i.e. the 
one minimising the observational difference between the system's components. We show here how 
to compute a non-trivial upper bound (5 to £ by essentially exploiting the algorithmic solution 
proposed by Paige and Tarjan [11] for computing bisimulation equivalence. This was already 
adapted to PTS's in [12], where it was used for constructing a padding algorithm as part of a 
transformational approach to the timing leaks problem. In this approach the computational paths 
of a program are transformed so as to make it perfectly secure by eliminating any possible timing 
covert channel while preserving its 1/0 behaviour. 

The algorithm we present here is an instantiation of that algorithm where the abstract labels are 
replaced by the statements in a concrete language (pWhile) and their execution times. Moreover, 
instead of transforming the execution trees, our algorithm accumulates the information about the 
difference between their transition probabilities and uses this information to compute an upper 
bound 5 to the maximal information leakage of the given program. 

5.1 Computing 5 for PT-Bisimulation 

Algorithm 2 describes a procedure that can be used inside an algorithm for constructing a lumping 
(i.e. a PT-bisimulation equivalence) of two tPTS's Ti and T2. In particular. Algorithm 1 refers to 
a such a procedure which follows the algorithmic paradigm for partition refinement introduced by 
Paige and Tarjan in [11] (see also [13, 14]). The Paige- Tarjan algorithm constructs a partition of 
a state space S which is stable for a given transition relation ^. It is a well-known result that 
this partition corresponds to a bisimulation equivalence on the transition system (17,^). The 
refinement procedure used in the algorithm consists in splitting the blocks in a given partition P 
by replacing each block B Cz P with B OpreS and B \preS, where S* C Z" and pre{X) = {s (z S \ 
s ^ X for some x G X}. 



Algorithm 1 Algorithm for detecting critical blocks 



1: 

2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 
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procedure QLumping(Ti,T2) 

Assume: Ti execution tree with states Si, and T2 execution tree with states S2 

P^{SiUS2} 

while n < HElGHT(ri © T2) do 

s ^ {Bn CuTOFF(ri ffi T2, n)) 

while S / do 

choose B eS, S ^ S\B 
P ^ Splitting(B, P) 
end while 

Li ^- LAYER(ri,n), L2 
CompDelta(Li, L2) 
n ^ n + 1 
end while 
return S 
end procedure 



B e P} 



Layer(T2, n) 



t> (Initial) Partition 

t> Splitters (below) 

> Choose a splitter 
c- Split partition 

t> Go to next level 



Algorithm 2 Algorithm for computing S 



1 

2 
3 


procedure CompDelta(_Li, I/2) 
while I/i / do 

choose si G Li, Li ^ Li \ si 


4 
5 


/3<-cx) 
L^L2 


6 

7 
8 
9 


while L2 / do 

choose S2 £ L, L ^- L\ S2 
/3^min(/3,||x(si)-x(s2)| 

end while 


10 
11 


(5 ^ max((5, /3) 
end while 


12 


end procedure 



> For all si G Li 



> For all S2 G I/2 
l> Find best match 



In order to check whether two execution trees Ti and T2 in our tPTS model are PT-bisimilar, 
in Algorithm 1 we apply this refinement technique to the set of states formed by the disjoint union 
of the states in Ti and T2. The strategy of our lumping procedure QLumping(Ti, T2) is as follows: 
it proceeds iteratively layer by layer starting from the leaves layer, and splits the blocks in the 
current partition restricted to the current layer. The procedure CompDelta(Li, L2) computes 
for each two layers Li and L2, the maximal difference llx('Si) ~ x('S2)||oo between the probabilities 
to get from states in Ti n Li and T2 O Li, respectively, into states of layer L2- In the original 
lumping procedure this would determine a splitting of the states in layer Li. This value is stored 
in a variable /3 and compared with the current value of a variable S which contains the maximal 
difference up to that iteration. When the lumping algorithm terminates (that is when we have 
reached the root of the union tree), one of the following situations will occur: either the roots of 
Ti and T2 belong to the same class in the constructed partition (i.e. Ti and T2 are PT-bisimilar) 
or not. In the latter case 6 will contain a maximal difference in the transition probabilities of the 
two processes which makes them non-bisimilar. This is therefore an estimate of the information 
leakage of the system. Note that, by construction, S will be zero in the first case. 

The strategy for constructing the lumping described above determines the coarsest partition of 
a set which is stable wit a given relation [13, 14], that is in our case the coarsest PT-bisimulation 
equivalence. Obviously, this does not necessarily coincide with the 'minimal' one corresponding to 
the quantity s defined in [9]. Thus, 6 will be in general only a safe approximation, namely an upper 
bound to the capacity of probabilistic timing covert channel defined by e. The following proposition 
is therefore a corollary of Proposition 45 in [9] stating a similar assertion for e-bisimulation. 



Proposition 1. P is PT-secure iff for any pair of initial configurations Ci,C2 the corresponding 
execution trees Ti and T2 are such that QLumping(Ti,T2) returns d ~ 0. 

5.2 A Weighted Version: 6' 

The actual value of 6 is determined by the way we compute the best m.atch between the joint 
probability distributions x(si) and x(s2) in line 8 of CompDelta(Li,L2)- In order to compute 5 
we use the supremum norm, \\ ■ ||oo, between two distributions, i.e. the largest absolute difference 
between corresponding entries in x(si) and x(s2), respectively. In other words, we try to identify 
a class of states C (in the layer below) and a time interval t such that the probability of reaching 
this class in that time from si differs maximally from the one for S2. 

One can argue that this is a fair approach as we treat all classes and time labels the same way. 
However, it might be useful to develop a measure which reflects the fact that certain times and 
classes are more similar than others. 

From the point of view of the attacker, such a measure would encode her/his ability in detecting 
similarity as given by the nature and the precision of the instruments he is actually using. For 
example, suppose it is possible to reach the same class C from si and S2 with different times ti 
and ^2, such that the corresponding probabilities determine 5 (i.e. we have the maximal difference 
in this case). However, we might in certain circumstances also want to express the fact that ti 
and ^2 are more or less similar, e.g. for ti = 10 and t^ = 10.5 we might want a smaller 6' than for 
ti = 1 and ^2 = 100. In terms of the attacker, this means that we make our estimate dependent 
on the actual power of the time detection instrument that he/she possesses. 

In order to incorporate similarity of times and/or classes we need to modify the way we 
determine the best match in line 8 of CompDelta(Li, L2). Instead of determining the norm 
between x(*i) and x{-^2) we can compute a weighted version as: 

(3 <- min(/3, \\uj ■ x(si) - uj ■ x(s2)||oo) = min(/3, \\uj ■ (x(si) - X ' (s2))||oo), 

where lu re-scales the entries in xi-^i) and x(*2) so as to reflect the relative importance of certain 
times and/or classes. Note that "•" denotes here the component-wise and not the matrix multipli- 
cation: {uj-x)tc = ^tcXtc- Ifj for example, an attacker is not able to detect the absolute difference 
between times but can only measure multiplicities expressing approximative proportions, we could 
re-scale the x's via uotc = fog(0- 

In the following we will use a weighted version 5' which reflects the similarity of classes. The 
idea is to weight according to the "replaceability" of a class. To this purpose we associate to every 
class (in the layers below) a matching measure /i(C) = minc^c" 5' {C, C"), i.e. we determine the 
5' between a (sub)tree with a root in the class C in question and all (sub)trees with roots in any 
of the other classes C". We can take any representative of the classes C and C" as these are by 
definition bisimilar. The measure /i indicates how easy it is to replace class C by another one, or 
how good/precise is the attacker in distinguishing successor states. Then 5' is simply the weighted 
version of 5 as described above with LOtc — m(C')- Note that there is no problem with the fact that 
5' is defined recursively as we always know the 5' in the layers below before we compute 5' in the 
current layer. 

Example 2. In order to illustrate how 5 and 5' quantify the difference between various execution 
trees, let us consider the following four trees. 
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We abstract from the influence of different transition times and individual state labels, i.e. we 
assume that i = 1 for all transitions and that all states are labelled with the same label. 

If we compute the S and S' values between all the pairs of systems we get the following results: 



Ti 
T2 
T3 

T4 



Ti T2 T3 T4 6' 



0.000 0.500 1.000 0.000 Ti 

0.500 0.000 1.000 0.500 T2 

1.000 1.000 0.000 1.000 T3 

0.000 0.500 1.000 0.000 T4 



Ti 



0.000 0.250 0.125 0.000 
0.250 0.000 0.125 0.250 
0.125 0.125 0.000 0.125 
0.000 0.250 0.125 0.000 



From this we see that S and 6' are symmetric, i.e. the difference between two systems is symmetric; 
that every system is bisimilar with itself, i.e. S — — S' (as we have an empty diagonal); and 
that the difference between two systems is between zero and one with values in between very well 
possible. 

6 Cost Analysis 

In a recent article on "Software Bugtraps" in The Economist the authors report on some ongoing 
research at NIST on "Software Assurance Metrics and Tool Evaluation" [15]. They claim that 
"The purpose of the research is to get away from the feeling that 'all software has bugs' and say 'it 
will cost this much money to make software of this kind of quality '\ They then conclude: "Rather 
than trying to stamp out bugs altogether, in short, the future of "software that makes software 
better" may lie in working out where the pesticide can he most cost- effectively applied' . 

Our aim is to introduce "cost factors" in a similar way into computer security. Instead of trying 
to achieve perfect security we will look at the trade-off between costs of security counter measures 
- such as increased average running time - and the improvement in terms of security, which we can 
measure via the S introduced above. Even in simple examples we are able to exhibit interesting 
effects. 

6.1 Security Typing 

In [3] Agat introduces a program transformation to remove covert timing channels {timing leaks) 
from programs written in a sequential imperative programming language. The language used is 
a language of security types with two security levels that is based on earlier work by Volpano 
and Smith [16,1]. Whilst Volpano and Smith restrict the condition in both while-loops and if- 
commands to being of the lowest security level, Agat allows the condition in an if-command to be 
high security providing that an external observer cannot detect which branch was taken. He shows 
that if a program is typeable in his system, then it is secure against timing attacks. This result 
depends critically on a notion of bisimulation; an if-command with a high security condition is 
only typeable if the two branches are bisimilar. Agat's notion of bisimilarity is timing aware and 
based on a notion of low-equivalence which ensures stepwise non-interference. He does not give an 
algorithm for bisimulation checking. 

If a program fails to type, Agat presents a transformation system to remove the timing leak. 
The transformation pads the branches of if-commands with high security conditions with dummy 
commands. The objective of the padding is that both branches end up with the same timing and 
thus become indistinguishable by an external observer. The transformation utilises the concept 
of a low-slice: for a given command C, its low-slice C'l has the same syntactic structure as C 
but only has assignments to low security variables; all assignments to high security variables and 
branching on high security conditions are replaced by skip commands of appropriate duration. 
The transformation involves extending the branches in a high security if-command by adding 
the low-slice from the other branch. The effect of this transformation is that the timing of the 
execution of both branches are the same and equal to the sum of timing of the two branches in the 
untransformed program. Agat demonstrates that the transformation is semantically sound and 
that transformed programs are secure (correctness). 
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Table 2. Security Typing Rules 



In order to extend this system to our language, we only have to add a rule for the choose 
statement (essentially a straight forward extension of the rule for if). In detail, we present the 
typing rules in Table 2. Note that the rule {Hh) refers to the semantic notion of timed bisimilarity 
(as introduced in Section 4.1). 



6.2 Probabilistic Transformation 



We consider a probabilistic variant of Agat's language. Probabilities play an important role in the 
transformation. Rather than just adding the low slice from the other branch to each branch of 
a high security conditional, we transform each branch to make a probabilistic choice between its 
padded and untransformed variant. This allows us to trade-off the increased run-time of the padded 
program versus the vulnerability to attack of the untransformed program. The transformation 
described is just one on a whole spectrum of probabilistic transformations - at the other extreme 
we could probabilistically decide whether or not to execute each command in the low slice. All the 
formal transformation rules for probabilistic padding are the same as in [3] . The only exception is 
the rule (If//): Here we replace - provided certain typing conditions are fulfilled - the branches of 
an if statement not just by the correctly "padded" version as in [3]; instead we introduce in every 
branch a choice such that the secure replacement will be executed only with probability p while 
with probability I ~ p the original code fragment will be executed. 

In order to transform programs into secure versions we need to introduce an auxiliary notion, 
namely the notion of global effect ge{C) of commands. This is used to identify (global) variables 
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Table 3. Probabilistic Program Transformation 



which might be changed when a command C is executed. Here is its formal definition: 

ge{x := e) = {x} 
ge(Ci;C2)=5e(Ci) U ge{C2) 
ge{\i (e) then Ci else C2) = ge{Ci) U ge{C2) 
5e(while (e) do C) = ge{C) 
ge(chooseP Ci or C2) = .ge(Ci) U 56(6*2) 
5e(skipAsn x e) — % 
5e(skiplf e C) ^ ge{C). 

The judgments or transformation rules in Table 3 are of the general form: 

rhC^D\ Dl 

which represents the fact that with a certain (security) typing F we can transform the statement 
C into D - we also recorder as a side-product the so-called low slice Dl of D. 

6.3 An Example 

Our probabilistic version of Agat's padding algorithm allows us to obtain partially fixed programs. 
Depending on the parameter p with which we introduce empty low slices to obfuscate the timing 
leaks we can determine the (average) execution time of the fixed program in comparison with the 
improvement in security. 

Agat presents in his paper [3] an example which itself is based on Kocher's study [2] of timing 
attacks against the RSA algorithm. In order to illustrate our approach we simplify the example 



i := 1; 




i := 1; 




i := 1; 


while i<=3 


do 


while i<=3 do 




while i<=3 do 


if k[i]==l 


. then 


if k[i]==l then 




if k[i]==l then 


s := s; 




choose p: s := s; 


skip 


s := s; skip 


else 




or q: s := s 




else 


skip; 




ro 




s := s; skip 


fi; 




else 




fi; 


i := i+1; 




choose p: skip 




i := i+1; 


od; 




or q: s := s; 
ro 
fi; 

i := i+1; 
od; 


skip 


od; 



Table 4. Versions of Agat's Program: agat.pagat, and f agat 



slightly: The insecure program agat we start with is depicted on the left side in Table 4. The 
fully padded version Agat's algorithm produces, f agat, is on the right hand side of Table 4 (to 
keep things simple we omit Agat's empty statements like skipAsn s s; as skip as well as s:=s 
can be used just to 'spend time' without having any real effect on the store we can use e.g. 
s:=s in place of Agat's skipAsn s s). The program, pagat, presented in the middle of Table 4 
is the result of probabilistic padding: The original program agat is transformed in such a way 
that the compensating statements, i.e. low slices, are executed only with probability p while with 
probability q = I — p the original code is executed. For p = we have the same behaviour as 
the original program agat while for p = I this program behaves in the same way as Agat's fully 
padded version f agat. 

In our concrete experiments we used the following assumptions. The variable i can take values 
in {!,.., 4} while k is a three dimensional array with values in {0,1} - nothing is concretely 
assumed about s. The variables k, representing a secret key, and s have security typing H, while 
i is the only low variable which can be observed by an attacker. We implemented this example 
using (arbitrary) execution times: tasn — 3 (assign time), tbr = 2 (test/branch time), and tskip — 1 
(skip time), and tch = (choice time). 

The abstract semantics for the pagat program - which only records choice points and the 
moments in time when the low variable changes its value - produces the following execution trees 
if we start with keys k=011 and k=010: 





One can easily see from this how probabilistic padding influences the behaviour of a program: For 
every bit in the key k - i.e. every iteration - we have a choice between executing the original code 
with probability q = 1 — p or the 'safe' code with probability p. The new code always takes the 
same time (in our case 7 ticks) while the original code's execution time depends on whether k[i] 
is set or not (either 4 or 6 time steps in our case). Clearly, for p ~ we get in every iteration 
a different execution time, depending on the bit k[i], and thus can deduce the secrete value k 
by just observing the execution times. However, as the execution time is always the same for the 
replacement code, it is impossible to do the same for p = 1. For values of p between and 1, the 
(average) execution times for k[i] = and k[i] ~ 1 become more and more similar. This means 
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Fig. 1. Running Time i(p) and Security Level &' (jp) as Functions of p 



in practical terms that the attacker has to spend more and more time (i.e. repeated observations 
of the program) in order to determine with high confidence the exact execution time and thus 
deduce the value of k[i] (cf. e.g. [9]). 

The price we have to pay for increased security, i.e. indistinguishability of behaviours, is an 
increased (average) execution time. The graph on the left in Figure 1 shows how the running 
time (vertical axis) increases in dependence of the padding probability p (horizontal axis) for the 
eight execution trees we have to consider in this example, i.e. for k ~ 000, k = GDI, k = DIG, etc. 
Depending on the number of bits set in k we get four different curves which show how, for example 
for k = GGG the running time increases from 29 time steps (for p = 0, i.e. agat program) to 38 (for 
p = 1, i.e. f agat program). 

We can employ the bisimilarity measures b and b' in order to determine the security of the 
partially padded program. For this we compute using our algorithm (5(ki,kj) and (5'(ki,kj) for all 
possible keys, i.e. z, j = 0, ... 7. It turns out that 5 = 1 for all values of p < 1 and any pair of 
keys ki and kj with i ^ i] only for p = 1 we get, as one would expect, (5 = for all key pairs. 
The weighted measure 5' is more sensitive and we get for example for p = 0.5 the following values 
when we compare k^ and kj-: 



5' 



GGG 
001 
010 
Oil 
100 
101 
110 
111 



000 001 010 Oil 100 101 110 111 



0.000 
0.125 
0.250 
0.125 
0.500 
0.125 
0.250 
0.125 



0.125 
0.000 
0.125 
0.250 
0.125 
0.500 
0.125 
0.250 



0.250 
0.125 
0.000 
0.125 
0.250 
0.125 
0.500 
0.125 



0.125 
0.250 
0.125 
0.000 
0.125 
0.250 
0.125 
0.500 



0.500 
0.125 
0.250 
0.125 
0.000 
0.125 
0.250 
0.125 



0.125 0. 
0.500 0. 
0.125 0. 
0.250 0. 
0.125 0. 
0.000 0. 
0.125 0. 
0.250 0. 



250 0.125 
125 0.250 
500 0.125 
125 0.500 
250 0.125 
125 0.250 
000 0.125 
125 0.000 



The diagonal entries are, of course, all zero as every execution tree is bisimilar to itself. The other 
entries however are different from and 1 and reflect the similarity between the two keys and thus 
the resulting execution trees. If we plot the development of 5' as a function of p we observe only 
three patterns as depicted in the right graph in Figure 1. In all three cases 5' decreases from an 
original value 1 to 0, but in different ways. 

In analysing the trade-off between increased running time and security we need to define a cost 
function. For example, one could be faced with a situation where a certain code fragment needs to 
be executed in a certain maximal time, i.e. there is a (cost) penalty if the execution takes longer 
than a certain number of micro-seconds. In our case we will consider a very simple cost function 
c{p) = Q5'{p) + t{p) with 5'{p) and t{p) the average 5' between all possible execution trees and t 
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Fig. 2. Trade-OfT and Costs c(p) as a Functions of p 



the average running time. The diagram in Figure 2 depicts how c(p), (^'(p) and i(p) depend on the 
padding parameter p. 

One can argue about the practical relevance of the particular cost function we chose. Never- 
theless, this example illustrates already nicely the non-linear nature of security cost optimisation: 
The optimal, i.e. minimal, cost is reached in this case for p — 0.5, i.e. keeping the cost of security 
counter measures in mind it is better to use a "half-fixed" program rather than a completely safe 



7 Related and Further Work 



The idea of defining a secure system via the requirement that an attacker must be unable to 
observe different behaviours as a result of different secrets ~ i.e. the system "operates in the same 
way" whatever value a secret key has - goes back at least to the seminal work of Gogucn and 
Meseguer [17]. 

This led in a number of settings to formalisations of security concepts such as "non-interference" 
via various notions of behavioural equivalencies (see e.g. [18, 19]). One of the perhaps most promi- 
nent of these equivalence notions, namely bisimilarity, plays an important role in the context of 
security of concurrent systems but also found application for sequential programs such as in Agat's 
work (as the interaction between system and attacker can be modelled as a parallel composition) . 
In order to allow for a decision theoretic analysis of security counter-measures and associated ef- 
forts it appears to be desirable to introduce a "quantitative" notion of the underlying behavioural 
equivalence. In the case of bisimilarity a first step was the introduction of the notion of proba- 
bilistic bisimulation by Larson and Skou [8]. However, this notion turns out to be still too strict 
and a number of researchers developed "approximate" versions; among them we just name the 
approaches by Desharnais et.al. [20,21] and van Breugel [22] and our work [10,24] (an extensive 
bibliography on this issue can be found in [23]). We based this current paper on the latter ap- 
proach because it allows for an implementation of the semantics of pWhile via linear operators, 
i.e. matrices, and an efficient computation of 5 and 5' using standard software such as octave [25]. 

Further research will be needed in order to clarify the relation between our measures 5 and 
existing notions of approximate bisimilarity mentioned above, e.g. the e in [9]. Furthermore, we 
also would like to shed more light on the relationship between our notion and information theoretic 
concepts used in the work of, for example Clark et.al. [26] and Boreale [27]. 
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